Protein remote homology detection based on auto-cross covariance transformation
نویسندگان
چکیده
Protein remote homology detection is a critical step toward annotating its structure and function. Supervised learning algorithms such as support vector machine are currently the most accurate methods. The position-specific score matrices (PSSMs) contain wealthy information about the evolutionary relationship of proteins. However, the PSSMs often have different lengths, which are difficult to be used by machine-learning methods. In this study, a simple, fast and powerful method is presented for protein remote homology detection, which combines support vector machine with auto-cross covariance transformation. The PSSMs are converted into a series of fixed-length vectors by auto-cross covariance transformation and these vectors are then input to a support vector machine classifier for remote homology detection. The sequence-order effects can be effectively captured by this scheme. Experiments are performed on well-established datasets, and the remote homology is simulated at the superfamily and the fold level, respectively. The results show that the proposed method, referred to as ACCRe, is comparable or even better than the state-of-the-art methods in terms of detection performance, and its time complexity is superior to those of other profile-based SVM methods. The auto-cross covariance transformation provides a novel way for the usage of evolutionary information, which can be widely used for protein-level studies.
منابع مشابه
Structure-based Kernel for Remote Homology Detection
Remote homology detection is a central problem in computational biology. Currently, the most effective tools for addressing this problem are kernel-based discriminative methods employing support vector machines. These methods work by transforming the protein sequences into (a possibly high-dimensional) vector space, called feature space, and deriving a kernel function in the feature space, whic...
متن کاملDetecting Remote Protein Evolutionary Relationships via String Scoring Method
The amount of the information being churned out by the field of biology has jumped manifold and now requires the extensive use of computer techniques for the management of this information. The predominance of biological information such as protein sequence similarity in the biological information sea is key information for detecting protein evolutionary relationship. Protein sequence similarit...
متن کاملdRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation
Protein remote homology detection is an important task in computational proteomics. Some computational methods have been proposed, which detect remote homology proteins based on different features and algorithms. As noted in previous studies, their predictive results are complementary to each other. Therefore, it is intriguing to explore whether these methods can be combined into one package so...
متن کاملProfile-based direct kernels for remote homology detection and fold recognition
MOTIVATION Protein remote homology detection is a central problem in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for remote homology detection. The performance of these methods depends on how the protein sequences are modeled and on the method used to compute the kernel function between them. RESULTS We...
متن کاملUsing Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection
Protein remote homology detection is one of the most important problems in bioinformatics. Discriminative methods such as support vector machines (SVM) have shown superior performance. However, the performance of SVM-based methods depends on the vector representations of the protein sequences. Prior works have demonstrated that sequence-order effects are relevant for discrimination, but little ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computers in biology and medicine
دوره 41 8 شماره
صفحات -
تاریخ انتشار 2011